feat[gpu]: widen decimals for Arrow device export#8155
Conversation
4d21576 to
03c8b95
Compare
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
225.4 µs | 188.1 µs | +19.84% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.1 µs | 307.8 µs | -11.27% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ad/arrow-device-decimal (867a83f) with develop (c005aae)
6681667 to
b15cfc3
Compare
Arrow Decimal128/Decimal256 schemas require fixed 16/32-byte value buffers, while Vortex decimals may use narrower storage. Add a CUDA widening kernel for Arrow Device export and cover compact, wide, nullable, and empty decimal cases. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
b15cfc3 to
867a83f
Compare
| if constexpr (std::is_same_v<Input, int128_t>) { | ||
| return value; | ||
| } else if constexpr (std::is_same_v<Input, int256_t>) { | ||
| return int128_t {value.parts[0], value.parts[1]}; |
There was a problem hiding this comment.
can this end up truncating? I think it is currently possible in vortex to do this:
let array = DecimalArray::from_iter(
[i256::from_parts(0, 1)], // 2^128, so does not fit into i128
DecimalDType::new(38, 0), // this is normally i128
);when exporting this array we would pick the i256 -> i128 kernel and truncate the values without checking for overflow.
probably the right fix is for vortex to reject constructing such arrays
Arrow Decimal128/Decimal256 schemas require fixed 16/32-byte value buffers, while Vortex decimals may use narrower storage. Add a CUDA widening kernel for Arrow Device export and cover compact, wide, nullable, and empty decimal cases.